Architecture and Evaluation of an Asynchronous Array of Simple Processors

نویسندگان

  • Zhiyi Yu
  • Michael J. Meeuwsen
  • Ryan W. Apperson
  • Omar Sattari
  • Michael A. Lai
  • Jeremy W. Webb
  • Eric W. Work
  • Tinoosh Mohsenin
  • Bevan M. Baas
چکیده

This paper presents the architecture of an asynchronous array of simple processors (AsAP), and evaluates its key architectural features as well as its performance and energy efficiency. The AsAP processor calculates DSP applications with high energyefficiency, is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed of a two-dimensional array of simple single-issue programmable processors interconnected by a reconfigurable mesh network. Processors are designed to capture the kernels of many DSP algorithms with very little additional overhead. Each processor contains its own tunable and haltable clock oscillator, and processors operate completely asynchronously with respect to each other in a globally asynchronous locally synchronous (GALS) fashion. A 6 × 6 AsAP array has been designed and fabricated in a 0.18 μm CMOS technology. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520– 540 MHz at 1.8 V, and dissipates an average of 35 mW per processor at 520 MHz under typical conditions while executing applications such as a JPEG encoder core and a complete IEEE 802.11a/g wireless LAN baseband transmitter. Most processors operate at over 600 MHz at 2.0 V. Processors dissipate 2.4 mW at 116 MHz and 0.9 V. A single AsAP processor occupies 4% or less area than a single processing element Z. Yu (B) · M. J. Meeuwsen · R. W. Apperson · O. Sattari · M. A. Lai · J. W. Webb · E. W. Work · T. Mohsenin · B. M. Baas ECE department, UC Davis, Davis, CA 95616 USA e-mail: [email protected] B. M. Baas e-mail: [email protected] in other multi-processor chips. Compared to several RISC processors (single issue MIPS and ARM), AsAP achieves performance 27–275 times greater, energy efficiency 96–215 times greater, while using far less area. Compared to the TI C62x high-end DSP processor, AsAP achieves performance 0.8–9.6 times greater, energy efficiency 10–75 times greater, with an area 7– 19 times smaller. Compared to ASIC implementations, AsAP achieves performance within a factor of 2–5, energy efficiency within a factor of 3–50, with area within a factor of 2.5–3. These data are for varying numbers of AsAP processors per benchmark.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Regular VLSI Array for an Irregular Algorithm

We present an application specific, asynchronous VLSI processor array for the dynamic programming algorithm for the 0/1 knapsack problem. The array is derived systematically, using correctnesspreserving transformations, in two steps: the standard (dense) algorithm is first transformed into an irregular (sparse) functional program which has better efficiency. This program is then implemented as ...

متن کامل

High Performance and Energy Efficient Multi-core Systems for DSP Applications

This dissertation investigates the architecture design, physical implementation, result evaluation, and feature analysis of a multi-core processor for DSP applications. The system is composed of a 2-D array of simple single-issue programmable processors interconnected by a reconfigurable mesh network, and processors operate completely asynchronously with respect to each other in a Globally Asyn...

متن کامل

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

In today’s ASIC market validating a VLSI design in a field programmable gate array (FPGA) device before tape out can save a significant amount of time and therefore, money up front in the design process. At the University of California, Davis, we are working on the design of a highly parallel, reconfigurable processor chip, known as the Asynchronous Array of simple Processors (AsAP). This repor...

متن کامل

Evaluating Large System-on-Chip on Multi-FPGA Platform

This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs with support to Globally Asynchronous Locally Synchronous scheme. It allows early hardware/software co-verification and optimization. The architecture abstracts the underlying har...

متن کامل

An Energy-efficient Parallel H.264/AVC Baseline Encoder on a Fine-grained Many-core System

The emerging many-core architecture provides a flexible solution for the rapid evolving multimedia applications demanding both high performance and high energy-efficiency. However, developing parallel multimedia applications that can efficiently harness and utilize manycore architectures is the key challenge for scalable computing. We contribute to this challenge by presenting a fully-parallel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Signal Processing Systems

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2008